Advances in 3D ultrasound imaging technology have revealed associations between early placental volume and growth outcomes. However, a more detailed evaluation of placental shape remains elusive and could potentially lead to new quantitative metrics for fetal growth and outcome prediction. The goal of this project is to determine if 3D ultrasound-derived morphological characteristics of the placenta in the first trimester of pregnancy are predictive of whether a baby will be characterized as “small for gestational age” (in the 10th percentile for fetal birth weight relative to gestational age). We explore new custom 3D metrics of placenta morphology beyond the volume estimates and 2D measurements that are currently available on commercial scanners.
Nadav Schwartz, M.D., Associate Professor of Maternal Fetal Medicine - Dr. Schwartz is clinically trained in maternal fetal medicine and has expertise in the area of 3D ultrasound imaging in obstetrics. He is leading a U01-funded project whose goal is to develop new multi-modal imaging techniques for analyzing human placental morphology and perfusion in vivo. The U01 includes interdisciplinary teams in computational image analysis, ultrasound, MRI, and optical imaging. Dr. Schwartz provided education in the clinical motivation for this project, suggested placenta shape metrics that may potentially be associated with growth outcomes, and coordinated the image acquisition and verification of manual image segmentations performed for this project.
Paul Yushkevich, Ph.D., Associate Professor of Radiology - Dr. Yushkevich is trainined in mathematics and computer science and specializes in computational shape and image analysis. He developed the framework for deformable medial modeling that was adapted to 3D shape analysis of the placenta implemented in this project. He provided insight into derivation of placenta shape metrics and the machine learning methods that were used to create predictive models for low fetal birth weight. His specific suggestions were related to feature selection and use of logistic regression and random forest classifiers, cautioning against “double dipping” in the feature selection and model evaluation steps. He also provided advice related to working with small datasets.
James Gee, Ph.D., Associate Professor of Radiology and Computer and Information Science - Dr. Gee is trained in computer science and specializes in computational image analysis and machine learning for diverse medical imaging applications. He provided insight into potential sources of the discrepancy in placenta volume measurements that were observed in this project, and made recommendations related to combining maternal and fetal characteristics and image-derived features into predictive models.
The placenta serves many critical functions in maintaining a healthy pregnancy, including nutritional, immune, and hormonal support. Previous work indicates that placental size is significantly associated with perinatal outcomes such as birth weight, fetal growth disorders, preeclampsia and fetal demise. However, research related to in vivo placental morphology has been limited in terms of validation and rudimentary shape indices. The goal of this project is to determine if there are 3D morphological features of the placenta, in combination with maternal and fetal characteristics, that are predictive of low fetal birth weight. The objective is to analyze these features during the first trimester of pregnancy when processes critical to placenta pathophysiology are believed to occur.
This project is an interdisciplinary collaboration between the Departments of Maternal Fetal Medicine and Radiology, and combines expertise in maternal fetal medicine, computer and information science, ultrasound imaging, and statistics. It requires clinical expertise for image acquisition and knowledge of pregnancy and placenta development, and technical expertise for extracting image features and generating predictive models using image-derived placenta, maternal, and fetal characteristics.
The data set used in this study was acquired by the Department of Maternal Fetal Medicine at Penn and includes first-trimester placenta measurements and patient characteristics of 60 subjects with an outcome variable related to low fetal birth weight. We recognize that this is an extremely small dataset to develop and evaluate a predictive algorithm, but expect that the number of available datasets will increase by a factor of 10 when automated image segmentation algorithms have been validated. (We currently have >500 cases but are currently limited by the need to manually segment the placenta in 3D ultrasound images in order to extract custom shape metrics.)
Please note that the spreadsheets provided are dummy spreadsheets and may not match the narrative text and final HTML document submitted to Canvas.
This section gives an overview of image analysis performed, exploratory analysis of non-placenta features, feature selection based on importance and correlation analysis, and methods of model creation and evaluation.
3D ultrasound images were collected from over 500 patients during the first trimester of pregnancy. The volume of the placenta in each image had been previously estimated using GE’s “VOCAL” software package. Of these cases, the placenta was manually segmented in 60 images that were associated with a primary outcome variable related to low fetal birth weight. A trained researcher manually traced each placenta in ITK-Snap, an open-source software package for 3D medical image segmentation. An expert clinicial (Dr. Schwartz) verified the accuracy of each segmentation. The placenta segmentations were individually modeled using 3D deformable modeling using continuous medial axis representation (cm-rep), which establishes a shape-based coordinate system on each image segmentation such that standardized morphological features can be automatically computed. 25 measurements were automically derived from these models using custom Matlab functions, which included volume of the manual image segmentation, maternal and fetal surface areas of the placenta, maximum and mean placenta thickness, and the minumim/mean/maximum of the placenta diameter implemented three different ways. These shape features are individually defined in the data dictionary in the following section.
The data used in this study include maternal and fetal characteristics of the 60 subjects who underwent a 3D ultrasound exam during the first trimester of pregnancy at Penn, as well as morphological features that were obtained by manual image segmentation and deformable modeling of the placenta in the 3D ultrasound images. In addition, the data include placenta volume estimates that were made with a GE commercial software package called VOCAL. The latter were included to compare placental volumes that were obtained by manual image segmentation in ITK-Snap. The primary outcome variable is referred to as sga_10th, which indicates membership to the 10th percentile of the “small for gestational age” group, which relates to a measurement of birth weight relative to the gestational age.
The data is read from three spreadsheets:
Relevant variables are converted to numeric data or factors and renamed as necessary. An inner join is performed based on the study_id variable and assigned to the data frame df.merge. Below is a list of variables contained in df.merge. In the description of placenta shape features, the “least-squares plane” refers to the best-fit plane through the rim of the placenta.
| Non-Placenta Variable | Description |
|---|---|
| model_id | model ID number (cm-rep model of the placenta) |
| study_id | study ID number |
| race | race (1 - white, 2 - black, 3 - asian) |
| wtscrn | maternal weight at 1st screening visit (kg) |
| height | maternal height (in) |
| sbpscrn | maternal systolic blood pressure at 1st screening visit (mmHg) |
| crl | crown rump length of the fetus, measured in 2D ultrasound (mm) |
| pappamom | concentration of the maternal serum marker PAPP-A measured during the first trimester of pregnancy, related to fetal Down syndrome (mIU/mL) |
| fetal_sex | fetal sex (0 = male, 1 = female) |
| maternal_age_US1 | maternal age at the first ultrasound exam (years) |
| gest_age_US1 | gestational age at the first ultrasound exam (days) |
| sga_10th | membership to the 10th percentile of birth weight normalized to gestational age (primary outcome variable) |
| Placenta Shape Variable | Description |
|---|---|
| Vsnap | placenta volume measured by manual 3D ultrasound image segmentation in ITK-Snap (cc) |
| SA_fetal | area of the chorionic (fetal) surface of the placenta (mm2) |
| SA_maternal | area of the maternal surface of the placenta (mm2) |
| SA_f2m | ratio of the fetal and maternal surface areas of the placenta |
| SA_maternal_nga | SA_maternal divided by gestational age (mm2/days) |
| SA_fetal_nga | SA_maternal divided by gestational age (mm2/days) |
| Tcmrep_max | maximum placenta thickness (mm) |
| Tcmrep_mean | mean placenta thickness (mm) |
| circumference | length of the edge of the placenta (mm) |
| edge_maxheight | maximum distance of the placenta edge from a least-squares plane fit to the placenta edge (mm) |
| edge_minheight | minimum distance of the placenta edge from a least-squares plane fit to the placenta edge (mm) |
| diameter_2D_mean | mean diameter of the placenta, measured from a 2D projection of the placenta onto the least-squares plane (mm) |
| diameter_2D_max | maximum diameter of the placenta, measured from a 2D projection of the placenta onto the least-squares plane (mm) |
| diameter_2D_min | minimum diameter of the placenta, measured from a 2D projection of the placenta onto the least-squares plane (mm) |
| diameter_2D_std | standard deviation of the diameter of the placenta, measured from a 2D projection of the placenta onto the least-squares plane (mm) |
| surf_diameter_mean | mean placenta diameter, where diameter is measured from edge-to-edge across the maternal surface of the placenta (mm) |
| surf_diameter_max | maximum placenta diameter, where diameter is measured from edge-to-edge across the maternal surface of the placenta (mm) |
| surf_diameter_min | minimum placenta diameter, where diameter is measured from edge-to-edge across the maternal surface of the placenta (mm) |
| surf_diameter_std | standard deviation of the placenta diameter, where diameter is measured from edge-to-edge across the maternal surface of the placenta (mm) |
| diameter_3D_mean | mean 3D edge-to-edge placenta diameter (mm) |
| diameter_3D_max | maximum edge-to-edge 3D placenta diameter (mm) |
| diameter_3D_min | minimum edge-to-edge 3D placenta diameter (mm) |
| diameter_3D_std | standard deviation of edge-to-edge 3D placenta diameter (mm) |
| medial_surface_height_mean | mean height of the medial placental surface from the least-squares plane (mm) |
| medial_surface_height_max | maximum height of the medial placental surface from the least-squares plane (mm) |
| medial_surface_height_min | minimum height of the medial placental surface from the least-squares plane (mm) |
| sphericity | ratio of placenta volume to the volume of a sphere with the same radius |
Below is the script used for data import into R:
library(readxl, quietly = TRUE, warn.conflicts = FALSE)
library(tidyverse, quietly = TRUE, warn.conflicts = FALSE)## ── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
#setwd("/Users/alison/BMIN503_placenta")
# Read in the entire clinical sheet (DUMMY)
df.clinical.vars=read_xlsx("non_placenta_measures_dummy.xlsx")
# Select variables wanted
df.clinical.vars <- df.clinical.vars %>%
select(model_id = "Model number",
study_id = "Study ID",
race = "RACE",
wtscrn = "WTSCRN (kg)",
height = "HT (in)",
sbpscrn = "SBPSCRN",
crl = "CRL (mm)",
pappamom = "PAPPAMOM",
fetal_sex = "FETSEX1",
maternal_age_US1 = "Maternal Age at US1",
gest_age_US1 = "GA US1",
sga_10th = "SGA<10th%")
# List of maternal and fetal characteristics
clinical.vars.wanted = c('wtscrn','height','race','sbpscrn','crl','pappamom','fetal_sex','maternal_age_US1','gest_age_US1')
# Convert relevant columns to numeric
variables.numeric <- c("race","wtscrn","height","sbpscrn","crl","pappamom","fetal_sex","maternal_age_US1","sga_10th")
df.clinical.vars[,names(df.clinical.vars) %in% variables.numeric] <- sapply(lapply(df.clinical.vars[,names(df.clinical.vars) %in% variables.numeric],as.character),as.numeric)
# Read VOCAL measurements (DUMMY)
df.vocal=read_xlsx("vocal_measures_dummy.xlsx")
df.vocal <- df.vocal %>%
mutate(study_id=as.numeric(`Study ID`)) %>%
rename(Vvocal=VolumeA) %>%
select(c(study_id, Vvocal))
# Read 3DUS measurements and convert units if necessary (DUMMY)
measures3D.wanted = c('Vsnap','SA_fetal','SA_maternal','Tcmrep_max','Tcmrep_mean','circumference','edge_maxheight','edge_minheight','diameter_2D_mean','diameter_2D_min','diameter_2D_max','diameter_2D_std','surf_diameter_mean','surf_diameter_min','surf_diameter_max','surf_diameter_std','diameter_3D_mean','diameter_3D_min','diameter_3D_max','diameter_3D_std','medial_surface_height_max','medial_surface_height_mean','medial_surface_height_min','SA_f2m','SA_maternal_nga','SA_fetal_nga','sphericity')
df.morph3D=read.csv("placenta_measures_dummy.csv")
df.morph3D <- df.morph3D %>%
rename(model_id=Model) %>%
mutate(Vsnap=Vmanual/1000) %>%
mutate(Tcmrep_mean=thickness_mean/10) %>%
mutate(Tcmrep_max=thickness_max/10) %>%
mutate(SA_f2m=Fetal.to.maternal.SA) %>%
mutate(SA_maternal_nga=GA.norm.Maternal.SA) %>%
mutate(SA_fetal_nga=GA.norm.Fetal.SA) %>%
select(c('model_id',measures3D.wanted))
# Merge clinical data, vocal measurements, and 3DUS measurements
df.merge <- inner_join(df.clinical.vars,df.vocal,by="study_id") %>%
filter(model_id %in% seq(1,60,1)) %>%
inner_join(df.morph3D,by="model_id") %>%
mutate(race = factor(race, levels=c(1,2,3), labels=c("white","black","asian"))) %>%
mutate(fetal_sex = factor(fetal_sex, levels=c(0,1), labels=c("male","female"))) %>%
mutate(sga_10th = factor(sga_10th, levels=c(0,1), labels=c("no","yes"))) Exploratory data analysis is performed on the material and fetal (non-placenta) characteristics. Plots of each characteristic are made with respect to the dependent variable, sga_10th.
library(GGally, quietly = TRUE, warn.conflicts = FALSE)
library(ggplot2, quietly = TRUE, warn.conflicts = FALSE)
library(cowplot, quietly = TRUE, warn.conflicts = FALSE)##
##
## *******************************************************
## Note: cowplot does not change the default ggplot2 theme
## anymore. To recover the previous behavior, execute:
## theme_set(theme_cowplot())
## *******************************************************
library(ggthemes, quietly = TRUE, warn.conflicts = FALSE)
library(plotly, quietly = TRUE, warn.conflicts = FALSE)
# Maternal characteristics relative to sga_10th
m1 <- ggplot(data=df.merge, aes(x=race, fill=sga_10th)) +
geom_bar(position="stack") +
labs(x="race",y="count") +
theme(legend.position = "bottom",
legend.title = element_text(size=8),
legend.text = element_text(size=8))
m2 <- ggplot(data=df.merge, aes(x=sga_10th,y=sbpscrn)) +
geom_boxplot() +
labs(x="sga_10th",y="SBP (bpm)") +
theme(legend.position="none")
m3 <- ggplot(data=df.merge, aes(x=sga_10th,y=height)) +
geom_boxplot() +
labs(x="sga_10th",y="maternal height (in)") +
theme(legend.position="none")
m4 <- ggplot(data=df.merge, aes(x=sga_10th,y=wtscrn)) +
geom_boxplot() +
labs(x="sga_10th",y="maternal weight (kg)") +
theme(legend.position="none")
m5 <- ggplot(data=df.merge, aes(x=sga_10th,y=maternal_age_US1)) +
geom_boxplot() +
labs(x="sga_10th",y="maternal age (years)") +
theme(legend.position="none")
m6 <- ggplot(data=df.merge, aes(x=sga_10th,y=pappamom)) +
geom_boxplot() +
labs(x="sga_10th",y="PAPP-A level (mIU/mL)") +
theme(legend.position="none")
#plot_grid(m1, m2, m3, m4, m5, m6, ncol = 3, labels=NULL,title())
pm <- plot_grid(m1, m2, m3, m4, m5, m6, ncol = 3, labels=NULL)
pm.title <- ggdraw() +
draw_label("Maternal Characteristics",
fontface = 'bold')
plot_grid(pm.title, pm, ncol = 1, rel_heights = c(0.1, 1)) # rel_heights values control title margins# fetal characteristics relative to sga_10th
f1 <- ggplot(data=df.merge, aes(x=fetal_sex, fill=sga_10th)) +
geom_bar(position="stack") +
labs(x="fetal sex",y="count")
f2 <- ggplot(data=df.merge, aes(x=sga_10th,y=gest_age_US1)) +
geom_boxplot() +
labs(x="sga_10th",y="gest. age (days)")
f3 <- ggplot(data=df.merge, aes(x=sga_10th,y=crl)) +
geom_boxplot() +
labs(x="sga_10th",y="crown rump length (mm)")
pf <- plot_grid(f1, f2, f3, ncol = 2, labels=NULL)
pf.title <- ggdraw() +
draw_label("Fetal Characteristics",
fontface = 'bold')
plot_grid(pf.title, pf, ncol = 1, rel_heights = c(0.1, 1)) # rel_heights values control title marginsSince two outliers are noted in the maternal systolic blood pressure (SBP), the data frame df.merge is updated to change the values of these outliers to NA. A boxplot of systolic blood pressure is generated again (below) to check the distribution of values in the sga_10th positive and negative groups:
df.merge$sbpscrn <- sapply(df.merge$sbpscrn,function(x){if (x > 750) NA else x})
m <- ggplot(data=df.merge, aes(x=sga_10th,y=sbpscrn),pi) +
geom_boxplot() +
labs(x="sga_10th",y="systolic blood pressure (mmHg)") +
theme(legend.position="none")
plot(m, )We expect many of the placenta shape measurements to be correlated since there are various ways of measuring morphological features of the placenta. For example, placenta diameter can be measured multiple ways (from a 2D projection, from edge-to-edge in 3D, or across the maternal surface of the placenta), and it is not known which technique is most clinically meaningful. We also suspect that surface area measurements may be highly correlated with placenta volume, so the next step is to reduce the number of morphological measurements to those that are likely most important. We first eliminate shape features based on importance metrics, and then further eliminate features that are significantly correlated and have low variance. The importance metrics are evaluated first because reducing features based on correlation analysis alone was less straightforward. (In the Results section, we describe potential drawbacks of performing feature selection in this manner.)
The variable names of all the placenta shape features are given in the vector measures3D.wanted. These measurements are first normalized since they have different units and scales. The measurements are then ranked with respect to importance by creating (1) univariate linear regression models between each measurement and the sga_10th outcome variable and (2) a random forest model using all measurements as features. The p-values from linear regression are listed in ascending order and compared to the Gini importance scores listed in descending order, shown in the table below.
library(randomForest, quietly = TRUE, warn.conflicts = FALSE)## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
library(knitr)
library(kableExtra)
# First all variables are normalized
normalize <- function(x) {
return ((x - min(x,na.rm=TRUE)) / (max(x,na.rm=TRUE) - min(x,na.rm=TRUE)))
}
df.merge[measures3D.wanted] <- as.data.frame(sapply(df.merge[measures3D.wanted], normalize))
# List p-values for univariate analyses
pvals.glm <- function(feature_vec){
pvals <- vector()
frm <- sga_10th ~ v
for(i in 1:length(feature_vec)){
frm <- as.formula(paste("sga_10th ~",feature_vec[i]))
v.glm <- glm(formula=frm, data=df.merge, family=binomial())
pvals[i] <- coef(summary(v.glm))[2,4]
}
pvals.glm <- data.frame("LM variable"=feature_vec[feature_vec!="sga_10th"],"pvals"=pvals) %>%
arrange(desc(1-pvals))
}
pvals.glm.shape <- pvals.glm(measures3D.wanted)
# Gini values from random forest classifier
gini.rf <- function(feature_vec){
measures.rf <- randomForest(sga_10th ~ ., data=df.merge[c(feature_vec,'sga_10th')], ntree=100, importance=TRUE, na.action=na.omit)
rf.pred <- predict(measures.rf, df.merge[c(feature_vec,'sga_10th')], type="response")
rf.pred.binary <- ifelse(rf.pred == "yes",2,1)
gini.rf <- data.frame("RF variable"=feature_vec,"MeanDecreaseGini"=measures.rf$importance[,4],row.names=NULL) %>%
arrange(desc(MeanDecreaseGini))
}
df.gini.shape <- gini.rf(measures3D.wanted)
df.imp.shape <- cbind(pvals.glm.shape,df.gini.shape)
kable(head(df.imp.shape, n=10L), align = "c", caption="Placenta features ranked by importance (LM and RF models)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 11)| LM.variable | pvals | RF.variable | MeanDecreaseGini |
|---|---|---|---|
| medial_surface_height_min | 0.1264464 | diameter_3D_max | 0.4886253 |
| Tcmrep_mean | 0.2039138 | edge_maxheight | 0.4639860 |
| medial_surface_height_mean | 0.2538770 | Tcmrep_mean | 0.4461688 |
| diameter_2D_max | 0.2863267 | diameter_2D_mean | 0.4312137 |
| diameter_3D_max | 0.2983572 | edge_minheight | 0.3920935 |
| edge_maxheight | 0.3908171 | diameter_3D_mean | 0.3884795 |
| diameter_2D_std | 0.4218593 | surf_diameter_min | 0.3732333 |
| sphericity | 0.4218670 | SA_f2m | 0.3711645 |
| diameter_3D_std | 0.4250725 | diameter_2D_std | 0.3663905 |
| surf_diameter_mean | 0.5214830 | medial_surface_height_min | 0.3462402 |
# List of measurements reduced based on p-values and Gini scores
shape.imp.reduce <- c('surf_diameter_min','surf_diameter_max','surf_diameter_mean','sphericity','edge_maxheight', 'edge_minheight','Tcmrep_max','medial_surface_height_max','SA_fetal','Vsnap')The number of placenta shape measurements is reduced based on p-value and Gini importance score, and the final shape variable names are stored in the vector shape.imp.reduce. The table below shows the top placenta shape variables selected next to the component of placenta morphology that they represent:
| Variable | Morphological feature type |
|---|---|
| surf_diameter_min | maternal surface size |
| surf_diameter_max | maternal surface size |
| surf_diameter_mean | maternal surface size |
| Tcmrep_mean | placenta thickness |
| Tcmrep_max | placenta thickness |
| edge_max_height | flatness of the placenta rim |
| edge_min_height | flatness of the placenta rim |
| sphericity | placenta convexity |
| medial_surface_height_max | placenta convexity |
Statistically significant pairwise correlations between these shape variables are identified by creating a correlogram and eliminating variables that are significantly correlated and have the lowest variance. The variance measurements are listed in the table below the correlogram. We notice, for example, that Vsnap and SA_fetal are strongly correlated to several other shape variables but have low variance and importance scores, so they are eliminated. The variable edge_maxheight has a strong inverse relationship to edge_minheight, so edge_minheight is eliminated since it has the lower variance of the two. The variables surf_diameter_mean and surf_diameter_min are significantly correlated to surf_diameter_max and medial_surface_height_max and have relatively low variance, so they are removed from the feature set as well. The variance of each shape feature is given in the table below. The reduced set of features are specified in the vector shape.cor.reduce.
library(Hmisc, quietly = TRUE, warn.conflicts = FALSE)
library(corrplot, quietly = TRUE, warn.conflicts = FALSE)## corrplot 0.84 loaded
# correlation of shape features
cor.shape.reduce <- cor(as.matrix(df.merge[shape.imp.reduce]))
# test for significant correlation between variables
cor.mtest <- function(mat, ...) {
mat <- as.matrix(mat)
n <- ncol(mat)
p.mat<- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
# matrix of correlation p-values
p.mat.shape <- cor.mtest(as.matrix(df.merge[shape.imp.reduce]))
# correlogram of placenta shape features
corrplot(cor.shape.reduce, method="circle",type="upper",tl.col="black", tl.srt=45,p.mat = as.matrix(p.mat.shape), sig.level = 0.01)# significantly correlated shape features
shape.vars.sig.cor = c("surf_diameter_min","surf_diameter_mean","surf_diameter_max","medial_surface_height_max","edge_minheight","edge_maxheight","SA_fetal","Vsnap","Tcmrep_max")
# variance of each variable
shape.vars.variance <- vector()
for(i in 1:length(shape.vars.sig.cor)){
shape.vars.variance[i] <- round(var(df.merge[shape.vars.sig.cor[i]]),5)
}
df.shape.variance <- data.frame(cbind(shape.vars.sig.cor,shape.vars.variance)) %>%
arrange(desc(shape.vars.variance))
names(df.shape.variance) <- c("placenta feature","variance")
kable(df.shape.variance,align = "c") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),full_width = FALSE, font_size = 11)| placenta feature | variance |
|---|---|
| medial_surface_height_max | 0.1149 |
| edge_minheight | 0.09557 |
| edge_maxheight | 0.08976 |
| surf_diameter_max | 0.08363 |
| surf_diameter_mean | 0.07984 |
| Vsnap | 0.07533 |
| surf_diameter_min | 0.07412 |
| Tcmrep_max | 0.06928 |
| SA_fetal | 0.0586 |
# list of shape measurements selected based on importance level and correlation analysis
shape.cor.reduce <- c('surf_diameter_max','sphericity','edge_maxheight','Tcmrep_max','medial_surface_height_max')The non-placenta features are likewise normalized and ranked with respect to increasing p-value and decreasing Gini score based on univariate linear regression and random forest modeling, respectively. A correlogram of the fetal and maternal characteristics is generated. The variance of the significantly correlated features (crown rump length and gestational age) are given, and the final reduced set of non-placenta features are listed in the vector vars.clinical.reduce after gestational age is removed for having lower variance and importance metrics than crown rump length.
# normalize non-placenta features
df.merge[clinical.vars.wanted][!sapply(df.merge[clinical.vars.wanted],is.factor)] <- as.data.frame(sapply(df.merge[clinical.vars.wanted][!sapply(df.merge[clinical.vars.wanted],is.factor)], normalize))
pvals.glm.clinical <- pvals.glm(clinical.vars.wanted)
df.gini.clinical <- gini.rf(clinical.vars.wanted)
df.imp.clinical <- cbind(pvals.glm.clinical,df.gini.clinical)
kable(head(df.imp.clinical, n=10L), align = "c", caption="Non-placenta features ranked by importance (LM and RF models)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size = 11)| LM.variable | pvals | RF.variable | MeanDecreaseGini |
|---|---|---|---|
| race | 0.1159713 | gest_age_US1 | 1.3212484 |
| gest_age_US1 | 0.1366997 | crl | 1.2336239 |
| crl | 0.1577564 | height | 1.1376785 |
| wtscrn | 0.3706302 | wtscrn | 0.8942445 |
| fetal_sex | 0.4074021 | pappamom | 0.8548171 |
| pappamom | 0.4875376 | sbpscrn | 0.7386144 |
| sbpscrn | 0.5041777 | maternal_age_US1 | 0.7108813 |
| height | 0.5064734 | race | 0.4232555 |
| maternal_age_US1 | 0.7680605 | fetal_sex | 0.0931364 |
# correlation of fetal and maternal characteristics
cor.clinical <- cor(data.matrix(df.merge[clinical.vars.wanted]),use="na.or.complete")
# matrix of correlation p-values
p.mat.clinical <- cor.mtest(data.matrix(df.merge[clinical.vars.wanted]))
# correlogram for fetal and maternal characteristics
corrplot(cor.clinical, method="circle",type="upper",tl.col="black", tl.srt=45,p.mat = as.matrix(p.mat.clinical), sig.level = 0.01)# significantly correlated maternal and fetal characteristics
clinical.vars.sig.cor = c("crl","gest_age_US1")
# variance of each correlated variable
clinical.vars.variance <- vector()
for(i in 1:length(clinical.vars.sig.cor)){
clinical.vars.variance[i] <- round(var(df.merge[clinical.vars.sig.cor[i]]),5)
}
#print(data.frame("non-placenta feature"=clinical.vars.sig.cor,"variance"=clinical.vars.variance) %>%
# arrange(desc(variance)))
df.clinical.variance <- data.frame(cbind(clinical.vars.sig.cor,clinical.vars.variance)) %>%
arrange(desc(clinical.vars.variance))
names(df.clinical.variance) <- c("non-placenta feature","variance")
kable(df.clinical.variance,align = "c") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),full_width = FALSE, font_size = 11)| non-placenta feature | variance |
|---|---|
| crl | 0.09054 |
| gest_age_US1 | 0.07077 |
# reduced set of non-placenta features
clinical.vars.cor.reduce = c("wtscrn","height","race","sbpscrn","pappamom","fetal_sex","crl","maternal_age_US1")
# combined set of shape and non-shape features (reduced)
vars.all.cor.reduce = c(shape.cor.reduce,clinical.vars.cor.reduce)Below are the final features selected for predictive model creation:
| Placenta shape variable | Description |
|---|---|
| surf_diameter_max | maximum maternal surface diameter |
| Tcmrep_max | maximum placenta thickness |
| edge_max_height | maximum “height” of the placenta rim |
| sphericity | placenta sphericity |
| medial_surface_height_max | non-flatness of the placenta |
| Non-placenta variable | Description |
|---|---|
| wtscrn | maternal weight |
| height | maternal height |
| race | maternal race |
| sbscrn | maternal systolic blood pressure |
| pappamom | maternal serum biomarker |
| fetal_sex | sex of the fetus |
| crl | crown rump length of the fetus |
| maternal_age_US1 | maternal age at ultrasound exam |
The following models are evaluated, each of which uses sga_10th as the primary outcome variable: + Logistic regression and random forest models with only the reduced set of placenta shape measurements as predictors + Logistic regression and random forest models with only the reduced set of non-placenta shape features (uncorrelated maternal and fetal characteristics) as predictors + Logistic regression and random forest models with the reduced set of placenta and non-placenta features as predictors
For each of the three models above, the model is first trained using all variables and then using 5-fold cross-validation. Because of the small sample size, 5-fold cross validation is used to ensure that several examples of sga_10th positive cases are sampled in test data. ROC analysis is performed for each of the models created. The function for model evaluation is referred to as model.eval below.
library(pROC, quietly = TRUE, warn.conflicts = FALSE)## Type 'citation("pROC")' for a citation.
library(boot, quietly = TRUE, warn.conflicts = FALSE)
model.eval <- function(df.vars.model,roc_title){
glm.reduce.alltrain <- glm(sga_10th ~ ., data=df.vars.model, family=binomial(logit))
glm.reduce.alltrain.pred <- predict.glm(glm.reduce.alltrain, df.vars.model, type="response")
glm.reduce.alltrain.binary <- ifelse(glm.reduce.alltrain.pred < 0.5,1,2)
rf.reduce.alltrain <- randomForest(as.formula("sga_10th ~."),data=df.vars.model, ntree=100, na.action=na.omit, importance=TRUE)
rf.reduce.alltrain.pred <- predict(rf.reduce.alltrain, df.vars.model, type="response")
rf.reduce.alltrain.binary <- ifelse(rf.reduce.alltrain.pred == "yes",2,1)
#K-Fold Cross Validation
N = nrow(df.vars.model)
K = 5
set.seed(1234)
s = sample(1:K, size=N, replace=T)
pred.outputs.glm <- vector(mode="numeric", length=N)
pred.outputs.rf <- vector(mode="numeric", length=N)
obs.outputs <- vector(mode="numeric", length=N)
offset <- 0
for(i in 1:K){
train <- filter(df.vars.model, s != i)
test <- filter(df.vars.model, s == i)
obs.outputs[1:length(s[s==i]) + offset] <- test$sga_10th
#GLM train/test
glm <- glm(sga_10th ~ ., data=train, family=binomial(logit),na.action=na.omit)
glm.pred.curr <- predict(glm, test, type="response")
pred.outputs.glm[1:length(s[s==i]) + offset] <- glm.pred.curr
# RF train/test
rf <- randomForest(sga_10th ~ ., data=train, ntree=100, na.action=na.omit)
rf.pred.curr <- predict(rf, newdata=test, type="prob")
pred.outputs.rf[1:length(s[s==i]) + offset] <- rf.pred.curr[,2]
offset <- offset + length(s[s==i])
}
# Logistic regression
roc.training.glm <- roc(df.vars.model$sga_10th, glm.reduce.alltrain.binary, ci=TRUE)
roc.cv.glm <- roc(obs.outputs, pred.outputs.glm, ci=TRUE)
plot.roc(df.vars.model$sga_10th, glm.reduce.alltrain.binary, col="black", ci=TRUE, main=roc_title)
plot.roc(obs.outputs, pred.outputs.glm, col="red", add=TRUE)
#Random Forest
roc.training.rf <- roc(as.numeric(df.vars.model$sga_10th), rf.reduce.alltrain.binary, ci=TRUE)
roc.cv.rf <- roc(obs.outputs, pred.outputs.rf, ci=TRUE)
plot.roc(df.vars.model$sga_10th,rf.reduce.alltrain.binary, col="blue", add=TRUE)
plot.roc(obs.outputs, pred.outputs.rf, ci=TRUE, col="darkgreen", add=TRUE)
legend("bottomright", legend=c(
paste("GLM Training: AUC",round(roc.training.glm$auc,3)),
paste("GLM Cross-Validation: AUC",round(roc.cv.glm$auc,3)),
paste("RF Training: AUC",round(roc.training.rf$auc,3)),
paste("RF Cross-Validation: AUC",round(roc.cv.rf$auc,3))),
col=c("black","red","blue","darkgreen"), lwd=2)
}Exploratory data analysis did not reveal signifance between any individual maternal/fetal characteristic and the sga_10th outcome variable. However, several points were noted:
Based on p-values generated from univariate linear regression, no placenta or non-placenta feature was found to have a statistically significant relationship (p < 0.05) with the sga_10th outcome variable. There was no clear transition in importance level in the initial random forest models generated for placenta and non-placenta features. Nonetheless, it was interesting that the placenta measurements with the lowest p-values and highest Gini scores included diameter measurements that were made across the maternal surface (rather than edge-to-edge 2D and 3D diameter measurements). The placenta features deemed most important captured a range of placenta morphology characteristics including maternal surface size, placenta thickness, flatness of the placenta rim, and placenta sphericity/convexity. Several placenta measurements had statistically significant pairwise associations as visualized in the correlograms, many of which were expected. For example, the fetal surface area and placenta volume were strongly correlated to maternal surface diameter metrics.
It is important to note that we perform shape feature selection by evaluating the relationships between the predictors and the outcome variable based on all available training data. This is not ideal and may be considered “double dipping” since we are using knowledge of each outcome to tailor the predictive model. In this project, shape feature selection on a subset of the entire data set would be difficult because of the small sample size. In the future expanded data set, this feature selection step will be performed excluding all test data.
The ROC analysis of the three model variations (placenta features only, non-placenta features only, placenta + non-placenta measurements as predictors of sga_10th) are shown below. Predition of the sga_10th outcome based on the selected placenta measurements was poor (AUC was 0.58 for logistic regression, 0.61 for random forest in 5-fold cross-validation). Using only maternal and fetal characteristics as predictors had a similar result. When the placenta measurements were combined with the maternal and fetal characteristics, the AUC modestly increased to 0.65 for logistic regression and 0.76 for random forest in the 5-fold cross-validation experiments. While the sample size used in this study is quite small to make any conclusions, these results suggest that combining placenta, fetal, and maternal characteristics into a predictive model will likely be more effective than predictions made on placenta volume estimates alone.
df.vars.shape.only <- df.merge[c(shape.cor.reduce,'sga_10th')]
df.vars.clinical.only <- df.merge[c(clinical.vars.cor.reduce,'sga_10th')]
df.vars.shape.plus.clinical <- df.merge[c(vars.all.cor.reduce,'sga_10th')]
model.shape.only <- model.eval(df.vars.model=df.vars.shape.only, "Model with placenta shape features as predictors")## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning in ci.auc.roc(roc, ...): ci.auc() of a ROC curve with AUC == 1 is always 1-1 and can be misleading.
model.clinical.only <- model.eval(df.vars.model=df.vars.clinical.only, "Model with non-placenta features as predictors")## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning in ci.auc.roc(roc, ...): ci.auc() of a ROC curve with AUC == 1 is always 1-1 and can be misleading.
## Warning in ci.auc.roc(roc, ...): ci.auc() of a ROC curve with AUC == 1 is always 1-1 and can be misleading.
## Warning in ci.auc.roc(roc, ...): ci.auc() of a ROC curve with AUC == 1 is always 1-1 and can be misleading.
model.shape.plus.clinical <- model.eval(df.vars.model=df.vars.shape.plus.clinical, "Model with placenta, maternal, and fetal features as predictors")## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading
## Warning in ci.auc.roc(roc, ...): ci.auc() of a ROC curve with AUC == 1 is always 1-1 and can be misleading.
## Warning in ci.auc.roc(roc, ...): ci.auc() of a ROC curve with AUC == 1 is always 1-1 and can be misleading.
## Warning in ci.auc.roc(roc, ...): ci.auc() of a ROC curve with AUC == 1 is always 1-1 and can be misleading.
The results of this study do not corroborate previous reports that first-trimester placenta volume measured on ultrasound is an independent predictor of low fetal birth weight. Since placenta volume estimates were made using two methods, the GE VOCAL software and manual image segmentation in ITK-Snap, we wanted to compare the two volume measurement methods and assess whether there is a statistically significant relationship between the VOCAL volume measurement and the sga_10th variable. Below is a summary of a logistic regression model that evaluates the VOCAL volume measurement, Vvocal, as an independent predictor of the sga_10th outcome. This result suggests that like the manual image segmentation volume measurement, Vsnap, the Vvocal measurement does not have a statistically significant relationship with sga_10th.
To explore the relationship between Vvocal and Vsnap, linear regression between the two variables was performed, illustrated in the interactive graph below. The interactive graph allows a user to scroll over individual cases and readily assess the difference in volume measurements and whether a given case belongs to the sga_10th positive category. It also displays each study ID number so that images from cases with large volume discrepancies can be quickly identified. This graph was used to identify a group of patients whose volume estimates using the two methods differed substantially. This issue is currently being troubleshooted and is challenging to assess since VOCAL export features have changed over the course of the project. Nonetheless, linear regression demonstrated a statistically significant linear relationship between Vvocal and Vsnap, summarized below (p < 1e-9).
# Logistic regression model relating primary outcome to VOCAL volume measurement
glm.vvocal = glm(sga_10th ~ Vvocal, data=df.merge,family=binomial())
summary(glm.vvocal)##
## Call:
## glm(formula = sga_10th ~ Vvocal, family = binomial(), data = df.merge)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5615 -1.2370 0.8859 1.0627 1.4379
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.56591 1.47212 1.064 0.287
## Vvocal -0.02803 0.02800 -1.001 0.317
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 23.508 on 16 degrees of freedom
## Residual deviance: 21.967 on 15 degrees of freedom
## AIC: 25.967
##
## Number of Fisher Scoring iterations: 4
# Volume analysis
gvol <- ggplot(data=df.merge, aes(x=Vvocal,y=Vsnap)) +
geom_smooth(method = "lm", color="black",size=0.1) +
geom_point(aes(shape=sga_10th,text=paste("Study ID:",study_id)),color="black",size=2) +
scale_shape_manual(values=c(1,3)) +
labs(x="Vvocal (cc)",y="Vsnap (cc)") ## Warning: Ignoring unknown aesthetics: text
ggplotly(gvol)# Linear regression model relating VOCAL volume measurement to volume of image segmentation in ITK-Snap
glm.vol.comp = glm(Vvocal~Vsnap,data=df.merge)
summary(glm.vol.comp)##
## Call:
## glm(formula = Vvocal ~ Vsnap, data = df.merge)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.247 -14.616 -4.173 1.769 96.397
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.12 10.28 4.291 0.000644 ***
## Vsnap 28.75 24.83 1.158 0.265009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 742.814)
##
## Null deviance: 12138 on 16 degrees of freedom
## Residual deviance: 11142 on 15 degrees of freedom
## AIC: 164.49
##
## Number of Fisher Scoring iterations: 2
This study investigated novel first-trimester placenta morphological measurements made on 3D ultrasound and evaluated their relationship to low fetal birth weight. In this work, the primary outcome variable was sga_10th (membership to the 10th percentile of birth weight normalized with respect to gestational age). Of 25 image-derived placenta shape features, none were found to be independently predictive of sga_10th; neither were 10 features related to maternal and fetal characteristics. After combining subsets of these features as predictors in logistic regression and random forest classifiers, it was found that a random forest model combining 5 placenta features and 9 non-placenta features had the best performance (AUC of 0.76). Interestingly we found that of the placenta shape features implemented, diameter measurements derived from the maternal surface of the placenta ranked highly in importance in logistic regression and random forest models. Neither placenta volume estimates made using manual image segmentation or commercial software were significanlty related to the outcome variable. It is likely that the small sample size in this study (with 17 patients being sga_10th positive) could explain these findings. It also may be the case that sga_10th is not the best outcome variable to characterize abnormally low birth weight. In future work and with the development of automated image segmentation methods, we will perform placenta shape analysis on a data set that is 10-fold larger using the R code implemented in this project.